Presentation: Tweet"Data Science at Scale with Spark"

Time: Tuesday 12:05 - 12:55 / Location: Grand Ballroom C

Apache Spark has been blessed as the replacement for MapReduce in Hadoop environments. It also runs in other deployment modes. Spark provides better performance, better developer productivity, and it supports a wider range of application scenarios than MapReduce, including event stream processing, ad hoc queries, graphs, and iterative algorithms. Graphs are a natural way to represent many data sets, such as social media networks, and iterative algorithms are important for Machine Learning, such as model training with gradient descent.

This talks discusses Spark from a Data Science perspective, it's strengths and weaknesses, the Scala, Java, Python, and R APIs it offers for common analytics problems, what's missing, and what's planned. We'll look at support for ad hoc queries over large data sets, machine learning algorithms, graph processing, the programmer experience, and the pragmatic concerns of running applications.

Download slides

Dean Wampler, TweetBig Data Architect at Typesafe & O'Reilly Author

Biography: Dean Wampler

Dean Wampler is the Big Data Architect at Typesafe and specializes in the application of Functional Programming principles to “Big Data” applications, using Hadoop and alternative technologies. Dean is a contributor to several open-source projects and the founder of the Chicago-Area Scala Enthusiasts. He is the author of Functional Programming for Java Developers, the co-author of Programming Scala, and the co-author of Programming Hive, all from O’Reilly. He pontificates on twitter, @deanwampler, and at polyglotprogramming.com.

Twitter: @deanwampler